Click here to join our community of experts to get information on job search, salaries and more.

G Associates LLC

Spark Developer

Company: G Associates LLC

Location: Remote

Posted on: February 19

Job details

Pay

  • $49.48 - $65.00 an hour

Job type

  • Contract

Shift and schedule

  • 8 hour shift

Work setting

  • Remote

Full job description

Job Title: Spark Developer / Engineer (2 positions)

Location: US Remote, work during PST time zone

Duration: 6-12 Months

Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.

Key Responsibilities:

Understanding the Existing Scalding Codebase

o Analyze the current Scalding-based data pipelines.

o Document existing business logic and transformations.

Migrating the Logic to Spark

o Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.

o Refactor data transformations and aggregations in Spark.

o Optimize Spark jobs for efficiency and scalability.

Ensuring Data Parity & Validation

o Develop data parity tests to compare outputs between Scalding and Spark implementations.

o Identify and resolve any discrepancies between the two versions.

o Work with stakeholders to validate correctness.

Writing Unit Tests & Improving Code Quality

o Implement robust unit and integration tests for Spark jobs.

o Ensure code meets engineering best practices (modular, reusable, and well-documented).

Required Qualifications:

  • Experience in big data processing with Apache Spark (PySpark or Scala).
  • Strong experience with data migration from legacy systems to Spark.
  • Proficiency in Scalding and MapReduce frameworks.
  • Experience with Hadoop, Hive, and distributed data processing.
  • Hands-on experience in writing unit tests for Spark pipelines.
  • Strong SQL and data validation experience.
  • Proficiency in Python, Scala
  • Knowledge of CI/CD pipelines for data jobs.
  • Familiarity with Apache Airflow orchestration tool.

Job Type: Contract

Pay: $49.48 - $65.00 per hour

Expected hours: 40 per week

Schedule:

  • 8 hour shift

Work Location: Remote

Similar Jobs